Research and Realization of the Extensible Data Cleaning Framework

نویسنده

  • Gao Wei
چکیده

This paper proposes the idea of establishing an extensible data cleaning framework which is based on the key technology of data cleaning, and the framework includes open rules library and algorithms library. This paper gives the descriptions of model principle and working process of the extensible data cleaning framework, and the validity of the framework is verified by experiment. When the data are being cleaning, all the errors in the data source can be cleaned according to the specific business by the predefined rules of the cleaning and choosing the appropriate algorithm. The last stage of the realization initially completes the basic functions of data cleaning module in the framework, and the framework which has ood efficiency and operation effect is verified by the experiment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Extensible Framework for Data Cleaning

We propose an extensible data cleaning tool, named AJAX, that supports the specification and efficient execution of complex data cleaning programs.

متن کامل

Declarative Support for Sensor Data Cleaning

Pervasive applications rely on data captured from the physical world through sensor devices. Data provided by these devices, however, tend to be unreliable. The data must, therefore, be cleaned before an application can make use of them, leading to additional complexity for application development and deployment. Here we present Extensible Sensor stream Processing (ESP), a framework for buildin...

متن کامل

XML based Framework for ETL Processes For Relational Databases

In Data Warehousing, Extraction-Transformation-Loading (ETL) are the key tasks that are responsible for the extraction of data from several sources, their cleansing, customization and insertion into data warehouse [10]. More specifically ETL tools are category of specialized tools with the task of dealing with data warehouse cleaning and loading problems. These task are very critical in every d...

متن کامل

TAILOR: A Record Linkage Tool Box

Data cleaning is a vital process that ensures the quality of data stored in real-world databases. Data cleaning problems are frequently encountered in many research areas, such as knowledge discovery in databases, data warehousing, system integration and e-services. The process of identifying the record pairs that represent the same entity (duplicate records), commonly known as record linkage, ...

متن کامل

Parametric study of a viscoelastic RANS turbulence model in the fully developed channel flow

One of the newest of viscoelastic RANS turbulence models for drag reducing channel flow with polymer additives is studied in different flow and rheological properties. In this model, finitely extensible nonlinear elastic-Peterlin (FENE-P) constitutive model is used to describe the viscoelastic effect of polymer solution and turbulence model is developed in the k-ϵ-(ν^2 ) ̅-f framework. The geome...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015